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EMPIRICAL DYNAMICS FOR LONGITUDINAL DATA 

By Hans-Georg Muller 1 and Fang Yao 2 

University of California, Davis and University of Toronto 

We demonstrate that the processes underlying on-line auction 
price bids and many other longitudinal data can be represented by 
an empirical first order stochastic ordinary differential equation with 
time-varying coefficients and a smooth drift process. This equation 
may be empirically obtained from longitudinal observations for a 
sample of subjects and does not presuppose specific knowledge of 
the underlying processes. For the nonparametric estimation of the 
components of the differential equation, it suffices to have available 
sparsely observed longitudinal measurements which may be noisy and 
are generated by underlying smooth random trajectories for each sub- 
ject or experimental unit in the sample. The drift process that drives 
the equation determines how closely individual process trajectories 
follow a deterministic approximation of the differential equation. We 
provide estimates for trajectories and especially the variance function 
of the drift process. At each fixed time point, the proposed empirical 
dynamic model implies a decomposition of the derivative of the pro- 
cess underlying the longitudinal data into a component explained by 
a linear component determined by a varying coefficient function dy- 
namic equation and an orthogonal complement that corresponds to 
the drift process. An enhanced perturbation result enables us to ob- 
tain improved asymptotic convergence rates for eigenfunction deriva- 
tive estimation and consistency for the varying coefficient function 
and the components of the drift process. We illustrate the differential 
equation with an application to the dynamics of on-line auction data. 

1. Introduction. Recently, there has been increasing interest in analyz- 
ing on-line auction data and in inferring the underlying dynamics that drive 
the bidding process. Each series of price bids for a given auction corresponds 
to pairs of random bidding times and corresponding bid prices generated 
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whenever a bidder places a bid [Jank and Shmueli (2005, 2006), Bapna, 
Jank and Shmueli (2008), Reddy and Dass (2006)]. Related longitudinal 
data where similar sparsely and irregularly sampled noisy measurements 
are obtained are abundant in the social and life sciences; for example, they 
arise in longitudinal growth studies. While more traditional approaches of 
functional data analysis require fully or at least densely observed trajectories 
[Kirkpatrick and Heckman (1989), Ramsay and Silverman (2005), Gervini 
and Gasser (2005)], more recent extensions cover the case of sparsely ob- 
served and noise-contaminated longitudinal data [Yao, Miiller and Wang 
(2005), Wang, Carroll and Lin (2005)]. 

A common assumption of approaches for longitudinal data grounded in 
functional data analysis is that such data are generated by an underlying 
smooth and square integrable stochastic process [Sy, Taylor and Cumberland 
(1997), Staniswalis and Lee (1998), Rice (2004), Zhao, Marron and Wells 
(2004), Hall, Miiller and Wang (2006)]. The derivatives of the trajectories 
of such processes are central for assessing the dynamics of the underlying 
processes [Ramsay (2000), Mas and Pumo (2007)]. Although this is difficult 
for sparsely recorded data, various approaches for estimating derivatives of 
individual trajectories nonparametrically by pooling data from samples of 
curves and using these derivatives for quantifying the underlying dynamics 
have been developed [Gasser et al. (1984), Reithinger et al. (2008), Wang, 
Li and Huang (2008), Wang et al. (2008)]. Related work on nonparametric 
methods for derivative estimation can be found in Gasser and Miiller (1984), 
Hardle and Gasser (1985) and on the role of derivatives for the functional 
linear model in Mas and Pumo (2009). 

We expand here on some of these approaches and investigate an empiri- 
cal dynamic equation. This equation is distinguished from previous models 
that involve differential equations in that it is empirically determined from a 
sample of trajectories, and does not presuppose knowledge of a specific para- 
metric form of a differential equation which generates the data, except that 
we choose it to be a first order equation. This stands in contrast to current 
approaches of modeling dynamic systems, which are "parametric" in the 
sense that a prespecified differential equation is assumed. A typical example 
for such an approach has been developed by Ramsay et al. (2007), where a 
prior specification of a differential equation is used to guide the modeling of 
the data, which is done primarily for just one observed trajectory. A problem 
with parametric approaches is that diagnostic tools to determine whether 
these equations fit the data either do not exist, or where they do, are not 
widely used, especially as nonparametric alternatives to derive differential 
equations have not been available. This applies especially to the case where 
one has data on many time courses available, providing strong motivation to 
explore nonparametric approaches to quantify dynamics. Our starting point 
is a nonparametric approach to derivative estimation by local polynomial 
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fitting of the derivative of the mean function and of partial derivatives of 
the covariance function of the process by pooling data across all subjects 
[Liu and Miiller (2009)]. 

We show that each trajectory satisfies a first order stochastic differen- 
tial equation where the random part of the equation resides in an additive 
smooth drift process which drives the equation; the size of the variance of 
this process determines to what extent the time evolution of a specific tra- 
jectory is determined by the nonrandom part of the equation over various 
time subdomains, and therefore is of tantamount interest. We quantify the 
size of the drift process by its variance as a function of time. Whenever the 
variance of the drift process Z is small relative to the variance of the process 
X , a deterministic version of the differential equation is particularly useful 
as it then explains a large fraction of the variance of the process. 

The empirical stochastic differential equation can be easily obtained for 
various types of longitudinal data. This approach thus provides a novel per- 
spective to assess the dynamics of longitudinal data and permits insights 
about the underlying forces that shape the processes generating the obser- 
vations, which would be hard to obtain with other methods. We illustrate 
these empirical dynamics by constructing the stochastic differential equa- 
tions that govern online auctions with sporadic bidding patterns. 

We now describe a data model for longitudinally collected observations, 
which reflects that the data consist of sparse, irregular and noise-corrupted 
measurements of an underlying smooth random trajectory for each subject 
or experimental unit [Yao, Miiller and Wang (2005)], the dynamics of which 
is of interest. Given n realizations Xi of the underlying process X on a 
domain T and iVj of an integer-valued bounded random variable N, we 
assume that iVj measurements Yy, i = 1 , . . . , n, j = 1 , . . . , Ni, are obtained at 
random times Tij, according to 

(1) Ylj — Xi(Tj,j) -\- £{j, T{j £ 7~, 

where Eij are zero mean i.i.d. measurement errors, with var(ejj) =cr 2 , inde- 
pendent of all other random components. 

The paper is organized as follows. In Section 2, we review expansions in 
eigenfunctions and functional principal components, which we use directly 
as the basic tool for dimension reduction — alternative implementations with 
B-splines or P-splines could also be considered [Shi et al. (1996), Rice and 
Wu (2001), Yao and Lee (2006)]. We also introduce the empirical stochastic 
differential equation and discuss the decomposition of variance it entails. 
Asymptotic properties of estimates for the components of the differential 
equation, including variance function of the drift process, coefficient of de- 
termination associated with the dynamic system and auxiliary results on 
improved rates of convergence for eigenfunction derivatives are the theme of 
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Section 3. Background on related perturbation results can be found in Daux- 
ois, Pousse and Romain (1982), Fine (1987), Kato (1995), Mas and Men- 
neteau (2003). Section 4 contains the illustration of the differential equation 
with auction data, followed by a brief discussion of some salient features 
of the proposed approach in Section 5. Additional discussion of some pre- 
liminary formulas is provided in Appendix A.l, estimation procedures are 
described in Appendix A. 2, assumptions and auxiliary results are in Ap- 
pendix A. 3 and proofs in Appendix A. 4. 

2. Empirical dynamics. 

2.1. Functional principal components and eigenfunction derivatives. A key 
methodology for dimension reduction and modeling of the underlying stochas- 
tic processes X that generate the longitudinal data, which usually are sparse, 
irregular and noisy as in (1), is Functional Principal Component Anal- 
ysis (FPCA). Processes are assumed to be square integrable with mean 
function E{X(t)} = fi(t) and auto-covariance function cov{X(t),X(s)} = 
G(t,s), s, t E.T, which is smooth, symmetric and nonnegative definite. Us- 
ing G as kernel in a linear operator leads to the Hilbert-Schmidt operator 
(Gf)(t) = f T G(t, s)f(s) ds. We denote the ordered eigenvalues (in declin- 
ing order) of this operator by Ai > A2 > • • • > and the corresponding or- 
thonormal eigenfunctions by 4>k(t)- We assume that all eigenvalues are of 
multiplicity 1 in the sequel. 

It is well known that the kernel G has the representation G(t, s) = 
J2T=i^k x </>fc(*)0fc( s ) and the trajectories generated by the process sat- 
isfy the Karhunen-Loeve representation [Grenander (1950)] Xi(t) = n{t) + 
Efcli Cifc^fcW- Here the £ ik = f T {Xi(t) - fj,(t)}(j> k (t) dt, k = 1, 2, . . . , i = 1, . . . , 
n, are the functional principal components (FPCs) of the random trajec- 
tories X{. The £fc are uncorrelated random variables with E(^k) = and 
= Afc, with ^2 k Xk < 00. Upon differentiating both sides, one obtains 

00 

(2) xi l \t)=^(t)+j2^k4 ) (t), 

k=l 

where (t) and (p^ (t) are the derivatives of mean and eigenfunctions. 

The eigenfunctions (fi k are the solutions of the eigen-equations J G(t,s) x 
(pk(s)ds = Afc(^fc(t), under the constraint of orthonormality. Under suitable 
regularity conditions, one observes 

G(t, s)(p k (s) ds = X k ^ cp k (t), 

<j> k 1 \t) = j-J^G(t,s)Ms)ds, 
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which motivates corresponding eigenfunction derivative estimates. A useful 
representation is 



cov 



oo 

{lW(t) ) lf^( S )} = ^A^ l) (i)^ ) ( S ) ) 

k=l 



(4) 

ui,u 2 G {0, l},s,t e T, 

which is an immediate consequence of the basic properties of the functional 
principal components For more details and discussion, we refer to Ap- 
pendix A.l. 

It is worthwhile to note that the representation (2) does not correspond to 
the Karhunen-Loeve representation of the derivatives, which would be based 
on orthonormal eigenfunctions of a linear Hilbert-Schmidt operator defined 
by the covariance kernel cov{X^ 1 \t),X^ 1 \s)}. A method to obtain this rep- 
resentation might proceed by first estimating cov{X^(t),X^\s)} using (4) 

for v\ = V2 = 1 and suitable estimates <p^p for eigenfunction derivatives, then 
directly decomposing cov{X^ 1 \t),X^ 1 \s)} into eigenfunctions and eigenval- 
ues. This leads to cov{X^\t), lW(s)} = YlkLi ^k, an< ^ the 



Karhunen-Loeve representation (i) 



/' 



(i) 



orthonormal eigenfunctions <f>k i [Liu and Miiller (2009)]. 



(*) + Efeli&fe.i^fc.iW, with 



2.2. Empirical stochastic differential equation. In the following we con- 
sider differentiable Gaussian processes, for which the differential equation 
introduced below automatically applies. In the absence of the Gaussian as- 
sumption, one may invoke an alternative least squares-type interpretation. 
Gaussianity of the processes implies the joint normality of centered processes 
{X(t) - n(t),xW(t) - at all points t £ T, so that 



XW(t) 
X(t) 

( 



(5) 



M «(t) 



^=1 

oo 



E^w 



/c=i 



No 



( 



\ 



( 



oo 

E 

fe=l 

oo 

E 

fc=l 



A^i 1} (t) 2 



i 

oo 



k=i 



This joint normality immediately implies a "population" differential equa- 
tion of the form E{X^(t) - ^{t)\X{t)} = P(t){X(t) - fi{t)}, as has been 
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observed in Liu and Miiller (2009); for additional details see Appendix A.l. 
However, it is considerably more interesting to find a dynamic equation 
which applies to the individual trajectories of processes A. This goal neces- 
sitates inclusion of a stochastic term which leads to an empirical stochastic 
differential equation that governs the dynamics of individual trajectories Aj. 

Theorem 1. For a differentiable Gaussian process, it holds that 

(6) x«(t) - M W(t) = mm) - Km + Z(t), t G T, 

where 

cov{xV(t),x(t)} Elti^4 ] (t)Mt) 

PU var{A(t)} Elli^kMt) 2 

(?) 1 d 

= 2^ lo g[vax{A(i)}], teT, 

and Z is a Gaussian process such that Z(t),X(t) are independent at each 
t€T and where Z is characterized by E{Z(t)} = and cov {Z (t), Z(s)} = 
G z (t, s), with 

oo oo 



fc=l fc=l 
(8) 

oo oo 

- £(>) yj wi"(o**w + ww £ ****(<)«*(»)• 



fc=i fc=i 



Equation (6) provides a first order linear differential equation which in- 
cludes a time-varying linear coefficient function /3(t) and a random drift 
process Z(t). The process Z "drives" the equation at each time t. It is 
square integrable and possesses a smooth covariance function and smooth 
trajectories. It also provides an alternative characterization of the individ- 
ual trajectories of the process. The size of its variance function var(Z(t)) 
determines the importance of the role of the stochastic drift component. 

We note that the assumption of differentiability of the process X in 
Theorem 1 can be relaxed. It is sufficient to require weak differentiabil- 
ity, assuming that A G W 1 ' 2 , where H 1 = W 1 ' 2 denotes the Sobolev space of 
square integrable functions with square integrable weak derivative [Ziemer 
(1989)]. Along these lines, equation (6) may be interpreted as a stochastic 
Sobolev embedding. Observe also that the drift term Z can be represented 
as an integrated diffusion process. Upon combining (2) and (6), and ob- 
serving that functional principal components can be represented as = 
y^kHk Jj-hk(u)dW(u), where hk is the kth eigenfunction of the Wiener 
process W on domain T = [0, T] and 7^ the associated eigenvalue, such a 
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representation is given by 

z{t ) = E \[w^ 2k ~ 1)7r jf sin { ^Vr^ } 1 ^ {t) " mm} dW{u) - 

Another observation is that the joint normality in (5) can be extended 
to joint normality for any finite number of derivatives, assuming these are 
well defined. Therefore, higher order stochastic differential equations can be 
derived analogously to (6). However, these higher-order analogues are likely 
to be much less relevant practically, as higher-order derivatives of mean and 
eigenfunctions cannot be well estimated for the case of sparse noisy data or 
even denser noisy data. 

Finally, it is easy to see that the differential equation (6) is equivalent to 
the following stochastic integral equation: 

X(t) = X(s) + {fx(t)-fx(s)} 
(9) + / (3(u){X(u) -fi(u)}du+ [ Z{u)du 



s 



for any s, t £ 7~, < s < t, 

in the sense that X is the solution of both equations. For a domain with left 
endpoint at time 0, setting s = in (9) then defines a classical initial value 
problem. Given a trajectory of the drift process Z and a varying coefficient 
function /?, one may obtain a solution for X numerically by Euler or Runga- 
Kutta integration or directly by applying the known solution formula for the 
initial value problem of an inhomogeneous linear differential equation. 

2.3. Interpretations and decomposition of variance. We note that equa- 
tions (6) and (9) are of particular interest on domains T or subdomains de- 
fined by those times t for which the variance function var{Z(t)} is "small." 
From (7) and (8) one finds 

V(t) =var{Z(t)} 

(var{X( 1) (t)}var{X(t)} - [cov{X^(t),X(t)}) 2 )/vav{X(t)} 

E A *(tf ) (*)) 2 E A ^w - \ E**^ w&w ' 

yk=l k=l lfc=l 



(10) 



/ k=l 

On subdomains with small variance function, the solutions of (6) will not 
deviate too much from the solutions of the approximating equation 

(11) xW {t ) - M W (t) = P(t){X(t) - M (t)}, t e T. 
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In this situation, the future changes in value of individual trajectories are 
highly predictable and the interpretations of the dynamic behavior of pro- 
cesses X obtained from the shape of the varying coefficient function (3(t) 
apply at the individual level. 

If (3(t) < 0, the dynamic behavior can be characterized as "dynamic re- 
gression to the mean"; a trajectory which is away from (above or below) 
the mean function fi at time t is bound to move closer toward the mean 
function \i as time progresses beyond t. Similarly, if j3(t) > 0, trajectories 
will exhibit "explosive" behavior, since deviations from the mean (above or 
below) at time t will be reinforced further as time progresses, so that tra- 
jectories are bound to move further and further away from the population 
mean trajectory. Intermediate cases arise when the function (3 changes sign, 
in which case the behavior will switch between explosive and regression to 
the mean, depending on the time subdomain. Another situation occurs on 
subdomains where both j3 and vai(Z(t)) are very small, in which case the 
deviation of the derivative of an individual trajectory from the population 
mean derivative will also be small which means that trajectory derivatives 
will closely track the population mean derivative on such subdomains. 

The independence of Z(t) and X(t) means that the right-hand side of (6) 
provides an orthogonal decomposition of X^\t) into the two components 
P(t)X(t) and Z(t) such that 

var{X (1) (i)} = (3(tf var{X(t)} +wai{Z{t)}. 

It is therefore of interest to determine the fraction of the variance of X"(t) 
that is explained by the differential equation itself, that is, the "coefficient 
of determination" 

n9 s 2 var{^)X(t)} _ var{Z(t)} 

1 ' {) var{X(D(t)} var{X(i)(i)}' 

which is seen to be equivalent to the squared correlation between X(t), X^ 1 ' (i), 

(13) R 2 (t) _ [cov{x(t),xW(t)}] 2 _ {EZi^\t)Mt)} 2 

var{X(i)}var{X«(i)} ^ A ^)2 ^ ^ ) (t)2 ' 

We are then particularly interested in subdomains of T where R 2 (t) is 
large, say, exceeds a prespecified threshold of 0.8 or 0.9. On such subdo- 
mains the drift process Z is relatively small compared to X^ (t) so that the 
approximating deterministic first order linear differential equation (11) can 
substitute for the stochastic dynamic equation (6). In this case, short-term 
prediction of X(t + A) may be possible for small A, by directly perusing the 
approximating differential equation (11). 

It is instructive to visualize an example of the function R 2 (t) for the case of 
fully specified eigenfunctions and eigenvalues. Assuming that the eigenfunc- 
tions correspond to the trigonometric orthonormal system {-v/2cos(2/c7rt), k = 
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Fig. 1. "Coefficient of determination" functions R 2 (t) (12), (13), quantifying the frac- 
tion of variance explained by the deterministic part of the dynamic equation (6), illustrated 
for the trigonometric basis {\/2 cos(2kirt), k = 1, 2, . . .} on [0,1] and eigenvalue sequences 
\k = fc~ 4 (solid) and Afc=2~ fc (dashed). 



1,2,.. .} on [0, 1], we find from (13) 

R 2 (t) = \ k k cos(2kTrt) sm(2kirt) 

/^2x k (cos(2kTrt)) 2 J2x k k(sm(2kTTt)) 2 , t€[0,l]. 

Choosing X k = k~ 4 ,\ k = 2~ k and numerically approximating these sums, 
one obtains the functions R 2 (t) as depicted in Figure 1. This illustration 
shows that the behavior of this function often will fluctuate between small 
and large values and also depends critically on both the eigenvalues and the 
shape of the eigenfunctions. 

3. Asymptotic properties. We obtain asymptotic consistency results for 
estimators of the varying coefficient functions /3, for the variance function 
var{Z(t)} of the drift process and for the variance explained at time t by the 
deterministic part (11) of the stochastic equation (6), quantified by R 2 (t). 
Corresponding estimators result from plugging in estimators for the eigen- 
values Afc, eigenfunctions <f> k and eigenfunction derivatives <p£ into the rep- 
resentations (7) for the function /3(t), (10) for the variance function of Z 
and (13) for R 2 {t). Here one needs to truncate the expansions at a finite 
number K = K(n) of included eigen-components. 

Details about the estimation procedures, which are based on local lin- 
ear smoothing of one- and two-dimensional functions, are deferred to Ap- 
pendix A. 2. Our asymptotic consistency results focus on L 2 convergence 
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rates. They peruse auxiliary results on the convergence of estimates of eigen- 
values, eigenfunctions and eigenfunction derivatives, complementing and im- 
proving upon related results of Liu and Miiller (2009), which were derived 
for convergence in the sup norm. Improved rates of convergence in the I? 
distance are the consequence of a special decomposition that we employ 
in the proofs to overcome the difficulty caused by the dependence of the 
repeated measurements. 

Required regularity conditions include assumptions for the distribution of 
the design points, behavior of eigenfunctions 4>k and eigenvalues A& as their 
order k increases and the large sample behavior of the bandwidths ha t o, h^i 
for the estimation of the mean function fi and its first derivative ^'(t), 
and hG,o,h,G,i for the estimation of the covariance surface and its partial 
derivative. We note that extremely sparse designs are covered, with only 
two measurements per trajectory; besides being bounded, the number of 
measurements Ni for the iih trajectory is required to satisfy P(Ni > 2) > 0. 

Specifically, for the observations (T^, Yij), i = 1, . . . , n, j = 1, . . . , iVj, made 
for the ith trajectory, we require that: 

(Al) TVj are random variables with Ni ' N, where N is a bounded pos- 
itive discrete random variable with and P{N > 2} > 0, and ({Tij,j € 
Ji},{Yij,j £ Ji}) are independent of Ni, for J,, C {1, . . . , Ni}. 

Writing Ti = (T a , . . . , T iNi f and Y< = (Y a , . . . , Y m ) T , the triples {T h Y i9 iVJ 
are assumed to be i.i.d. For the bandwidths used in the smoothing steps for 
H(t) and fi^(t) in (21), G(t,s) and G^'°\t,s) in (22), we require that, as 
n — > oo, 

(A2) max.(h li>0 ,h li>1 ,hG > o,hG,i) ->■ 0, nh^o -> oo, nh 3 ^ -> oo, nh 2 G ->■ oo, 
nh G 1 — > oo. 

To characterize the behavior of estimated eigenfunction derivatives (f>^(t), 
define 

(14) Si = Ai — A2, 5k = min(Aj_i — Xj, Xj — Xj+i), k>2. 

j<k 

For the kernels used in the local linear smoothing steps and underlying 
density and moment functions, we require assumptions (Bl) and (B2) in 
the Appendix. Denote the L 2 norm by ||/|| = { L- f 2 (t) dt} 1 / 2 , the Hilbert- 
Schmidt norm by ||$|| s = {f T f T {® 2 (t, s) dtds} 1 / 2 and also define ||$||^ = 

The following result provides asymptotic rates of convergence in the I? 
norm for the auxiliary estimates of mean functions and their derivatives as 
well as covariance functions and their partial derivatives, which are briefly 
discussed in Appendix A. 2. A consequence is a convergence result for the 
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eigenfunction derivative estimates (j>} , with constants and rates that hold 
uniformly in the order k > 1. 

Theorem 2. Under (Al) and (A2) and (B1)-(B3), /or z/G {0,1}, 



(15) 

\\G^-G^\ = O p {— \^ + h%\ 
For (jy}'(t) corresponding to of multiplicity 1, 



(16) 

1 r 1 . ^ , i ^ 1 , ,2 



\h{Vnh 2 G1 ' Sk\y/nhar 



o 



G,0 



where the O p (-) term in (16) is uniform in k > 1. 



An additional requirement is that variances of processes X and are 
bounded above and below, since these appear in the denominators of various 
representations, for example, in (10) and (13), 

(A3) 'miter G {v ' u) (s,s) >c>0 and \\G^ \\ u < oo for v = 0, 1, 

implying that < oo by the Cauchy-Schwarz inequality. Define re- 

mainder terms 

oo oo 

(17) R K ,u(t)= E W M (*)} 2 , = E A fe ^( S )^(t); 

fc=ff+l fc=_ftT+l 

by the Cauchy-Schwarz inequality, H-R^JIs < ||-Ra>||u- 

In order to obtain consistent estimates of various quantities, a neces- 
sary requirement is that the first K eigen-terms approximate the infinite- 
dimensional process sufficiently well. The increase in the sequence K = K(n) 
as n — > oo therefore needs to be tied to the spacing and decay of eigenvalues, 

(A4) K = o{mm{^h 2 Gtl ,hQl}), 

K 

E^ 1 = °( min (v / ^^G,0,/l G 2 o})' 
fc=l 

max \\Rw v\\ — > as n — >oo. 

v=0,l 

If the eigenvalues decrease rapidly and merely a few leading terms are 

p 

needed, condition (A4) is easily satisfied. We use "x" to connect two terms 
which are asymptotically of the same order in probability, that is, the terms 
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are O p of each other. Define the sequence 

(18) a n = m^hl^Y 1 + h 2 G>1 } + 6 k^ {(^G,o) _1 + h 2 Gfi }. 

Note that cov{X« (s), X« (f)} = G^ (s, t) = J2T=i A fctf } (a)^ (t) with 

corresponding plug-in estimate G K :1 \s,t) = Ylk=i ^fc$fe M^fc 1 ^(*)' where 
X = K(n) is the included number of eigenfunctions. The plug- in estimate for 

p{t) is based on (7) and given by $ K (t) = £f =1 (t)k(t) / T,k=l hM*) 2 

and analogously the plug-in estimate G Zt K of G 2 is based on represen- 
tation (8), using the estimate {3k. In a completely analogous fashion one 
obtains the estimates R K {t) of R 2 (t) from (13) and Vx(t) of the variance 
function V(t) = v&r(Z(t)) of the drift process from (10). The L 2 convergence 
rates of these estimators of various components of the dynamic model (6) 
are given in the following result. 

Theorem 3. Under (A1)-(A4) and (B1)-(B3), 
\\G K A) -G^\\ s = O p (a n + \\R K1 \\ s ), 

(19) 

WGk' 1 ' -GW\\ U = O p (a n + \\R K , 1 \\), 

WPk — ~ \\G z ,k - G z \\ s >c \\G Zj k - G z \\ u 

(20) £ \\R 2 K - R 2 \\ x \\V K - V\\ 

= O p (a n + \\Rk,o\\ + \\ r k,i\\)- 

The weak convergence and L 2 consistency for the estimated eigenval- 
ues {pk} and eigenfunctions {tpk} of the drift process Z is an immediate 
consequence of this result. To see this, one may use sup fc>1 \pk — pk\ = 

O p (\\G z - G z \\ s ) and \\ipk - ip k \\ = ^ _1 O p (||G 2 - G z \\ s ) where G z is any 
estimate of G z [Bosq (2000)]. Here the O p (-) terms are uniform in k and 
8\= px- p2, 81 = min j < fe (p j _i - pj, pj - p j+ i) for k > 2. 

4. Application to online auction data. 

4.1. Data and population level analysis. To illustrate our methods, we 
analyze the dynamic system corresponding to online auction data, specif- 
ically using eBay bidding data for 156 online auctions of Palm Personal 
Digital Assistants in 2003 (courtesy of Wolfgang Jank). The data are "live 
bids" that are entered by bidders at irregular times and correspond to the 
actual price a winning bidder would pay for the item. This price is usually 
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lower than the "willingness-to-pay" price, which is the value a bidder en- 
ters. Further details regarding the proxy bidding mechanism for the 7-day 
second-price auction design that applies to these data can be found in Jank 
and Shmueli (2005, 2006), Liu and Midler (2008, 2009). 

The time unit of these 7-day auctions is hours and the domain is the 
interval [0,168]. Adopting the customary approach, the bid prices are log- 
transformed prior to the analysis. The values of the live bids are sampled 
at bid arrival times Tij, where i = 1, ... ,156 refers to the auction index and 
j = 1, . . . ,Ni to the total number of bids submitted during the ith auction; 
the number of bids per auction is found to be between 6 and 49 for these 
data. We adopt the point of view that the observed bid prices result from 
an underlying price process which is smooth, where the bids themselves are 
subject to small random aberrations around underlying continuous trajec- 
tories. Since there is substantial variability of little interest in both bids and 
price curves during the first three days of an auction, when bid prices start 
to increase rapidly from a very low starting point to more realistic levels, 
we restrict our analysis to the interval [96, 168] (in hours), thus omitting the 
first three days of bidding. This allows us to focus on the more interesting 
dynamics in the price curves taking place during the last four days of these 
auctions. 

Our aim is to explore the price dynamics through the empirical stochastic 
differential equation (6). Our study emphasizes description of the dynamics 
over prediction of future auction prices and consists of two parts: a descrip- 
tion of the dynamics of the price process at the "population level" which 
focuses on patterns and trends in the population average and is reflected 
by dynamic equations for conditional expectations. The second and major 
results concern the quantification of the dynamics of auctions at the indi- 
vidual or "auction-specific level" where one studies the dynamic behavior 
for each auction separately, but uses the information gained across the en- 
tire sample of auctions. Only the latter analysis involves the stochastic drift 
term Z in the stochastic differential equation (6). We begin by reviewing 
the population level analysis, which is characterized by the deterministic 
part of (6), corresponding to the equation E(X^'(t) — fj,^\t)\X (t) — fJ,(t)) = 
/3(t){X(t) — n(t)}. This equation describes a relationship that holds for con- 
ditional means but not necessarily for individual trajectories. 

For the population level analysis, we require estimates of the mean price 
curve \i and its first derivative , and these are obtained by applying linear 
smoothers to (21) to the pooled scatterplots that are displayed in Figure 2 
(for more details, see Appendix A. 2). One finds that both log prices and log 
price derivatives are increasing throughout, so that at the log-scale the price 
increases are accelerating in the mean as the auctions proceed. 

A second ingredient for our analysis are estimates for the eigenfunctions 
and eigenvalues (details in Appendix A. 2). Since the first three eigenfunc- 
tions were found to explain 84.3%, 14.6% and 1.1% of the total variance, 
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Fig. 2. Smooth estimate of the mean function of log(Price) in the left panel and of its 
first derivative in the right panel. 



three components were selected. The eigenfunction estimates are shown in 
the left panel of Figure 3, along with the estimates of the corresponding 
eigenfunction derivatives in the right panel. For the interpretation of the 
eigenfunctions it is helpful to note that the sign of the eigenfunctions is 
arbitrary. We also note that variation in the direction of the first eigenfunc- 
tion cj)i corresponds to the major part of the variance. The variances Xi<pf(t) 




Fig. 3. Smooth estimates of the first (solid), second (dashed) and third (dotted) eigen- 
functions of process X (left panel) and of their derivatives (right panel), ^ (solid), <j>^ 
(dashed) and r^g 1 ' (dash- dotted). 
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that are attributable to this eigenfunction are seen to steadily decrease as t 
is increasing, so that this eigenfunction represents a strong trend of higher 
earlier and smaller later variance in the log price trajectories. 

The contrast between large variance of the trajectories at earlier times 
and smaller variances later reflects the fact that auction price trajectories 
are less determined early on when both relatively high as well as low prices 
are observed, while at later stages prices differ less as the end of the auction 
is approached and prices are constrained into a narrower range. Correspond- 
ingly, the first eigenfunction derivative is steadily increasing (decreasing if 
the sign is switched), with notably larger increases (decreases) both at the 
beginning and at the end and a relatively flat positive plateau in the middle 
part. 

The second eigenfunction corresponds to a contrast between trajectory 
levels during the earlier and the later part of the domain, as is indicated 
by its steady increase and the sign change, followed by a slight decrease at 
the very end. This component thus reflects a negative correlation between 
early and late log price levels. The corresponding derivative is positive and 
flat, with a decline and negativity toward the right endpoint. The third 
eigenfunction, explaining only a small fraction of the overall variance, reflects 
a more complex contrast between early and late phases on one hand and a 
middle period on the other, with equally more complex behavior reflected 
in the first derivative. 

The eigenfunctions and their derivatives in conjunction with the eigen- 
values determine the varying coefficient function /3, according to (7). The 
estimate of this function is obtained by plugging in the estimates for these 
quantities and is visualized in the left panel of Figure 5, demonstrating small 
negative values for the function (3 throughout most of the domain, with a 
sharp dip of the function into the negative realm near the right end of the 
auctions. 

For subdomains of functional data, where the varying coefficient or "dy- 
namic transfer" function j3 is negative, as is the case for the auction data 
throughout the entire time domain, one may interpret the population equa- 
tion E(X^\t) -nW(t)\X(t) = f3(t){X(t) -//(*)} as indicating "dy- 
namic regression to the mean." By this we mean the following: when a 
trajectory value at a current time t falls above (resp., below) the popula- 
tion mean trajectory value at t, then the conditional mean derivative of 
the trajectory at t falls below (resp., above) the mean. The overall effect of 
this negative association is that the direction of the derivative is such that 
trajectories tend to move toward the overall population mean trajectory as 
time progresses. 

Thus, our findings for the auction data indicate that "dynamic regression 
to the mean" takes place to a small extent throughout the auction period and 
to a larger extent near the right tail, at the time when the final auction price 
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is determined [see also Liu and Miiller (2009)]. One interpretation is that 
at the population level, prices are self-stabilizing, which tends to prevent 
price trajectories running away toward levels way above or below the mean 
trajectory. This self-stabilization feature gets stronger toward the end of the 
auction, where the actual "value" of the item that is being auctioned serves 
as a strong homogenizing influence. This means that in a situation where 
the current price level appears particularly attractive, the expectation is 
that the current price derivative is much higher than for an auction with an 
unattractive (from the perspective of a buyer) current price, for which then 
the corresponding current price derivative is likely lower. The net effect is 
a trend for log price trajectories to regress to the mean trajectory as time 
progresses. 

4.2. Auction- specific dynamics. We illustrate here the proposed stochas- 
tic differential equation (6). First estimating the function ft, we obtain the 
trajectories Z{ of the drift process. These trajectories are presented in Fig- 
ure 4 for the entire sample of auctions. They quantify the component of the 
derivative process that is left unexplained by the varying coefficient 
function and linear part of the dynamic model (6). The trajectories Zi ex- 
hibit fluctuating variances across various subdomains. The subdomains for 
which these variances are small are those where the deterministic approxima- 
tion (11) to the stochastic differential equation works best. It is noteworthy 
that the variance is particularly small on the subdomain starting at around 
158 hours toward the endpoint of the auction at 168 hours, since auction 
dynamics are of most interest during these last hours. It is well known that 




Time (hour) 

Fig. 4. Smooth estimates for the trajectories of the drift process Z . 
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Fig. 5. Left: smooth estimate of the dynamic varying coefficient function j3. Right: 
smooth estimates of the first (solid), second (dashed) and third (dash-dotted) eigenfunction 
of Z based on (8). 

toward the end of the auctions, intensive bidding takes place, in some cases 
referred to as "bid sniping," where bidders work each other into a frenzy to 
outbid each other in order to secure the item that is auctioned. 

The right panel of Figure 5 shows the first three eigenfunctions of Z, 
which are derived from the eigenequations derived from estimates G Zj k of 
covariance kernels G z ^k (8) that are obtained as described after (22). In 
accordance with the visual impression of the trajectories of Z in Figure 4, 
the first eigenfunction reflects maximum variance in the middle portion of 
the domain and very low variance at both ends. Interestingly, the second 
eigenfunction reflects high variance at the left end of the domain where 
prices are still moving upward quite rapidly, and very low variance near the 
end of the auction. This confirms that overall variation is large in the middle 
portion of the auctions, so that the drift process in (6) plays an important 
role in that period. 

Further explorations of the modes of variation of the drift process Z can be 
based on the functional principal component scores of Zi. Following Jones 
and Rice (1992), we identify the three auctions with the largest absolute 
values of the scores. A scatterplot of second and first principal component 
scores with these auctions highlighted can be seen in the left upper panel of 
Figure 6. The corresponding individual (centered) trajectories of the drift 
process Z are in the right upper panel, and the corresponding trajectories 
of centered processes X and X^ in the left and right lower panels. The 
highlighted trajectories of Z are indeed similar to the corresponding eigen- 
functions (up to sign changes), and we find that they all exhibit the typical 
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Fig. 6. Top left: point cloud corresponding to the three leading FPC scores of trajectories 
Zi, where the point marked by a circle corresponds to the auction with the largest (in 
absolute value) first score, the point marked with a square to the auction with the largest 
second score and the point marked with a "triangle" to the auction with the largest third 
score, respectively. Top right: the trajectories Zi of the drift process for these three auctions, 
where the solid curve corresponds to the trajectory of the "circle" auction, the dashed curve 
to the "square" auction and the dash-dotted curve to the "triangle" auction. Bottom left: 
corresponding centered trajectories Xi . Bottom right: corresponding centered trajectory 
derivatives X^ . 



features of small variance near the end of the auction for Z and X and of 
large variance for X^ l \ 

For the two trajectories corresponding to maximal scores for first and 
second eigenfunction of Z we find that near the end of the auctions their 
centered derivatives turn negative. This is in line with dynamic regression 
to the mean, or equivalently, negative varying coefficient function f3, as de- 
scribed in Section 4.1. Here the trajectories for X at a current time t are 
above the mean trajectory, which means the item is pricier than the aver- 
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Fig. 7. Left: smooth estimates of the variance functions of X^'(t) (dashed) and Z(t) 
(solid). Right: smooth estimate of R 2 (t) (12), the variance explained by the deterministic 
part of the dynamic equation at time t. 

age price at t. As predicted by dynamic regression to the mean, log price 
derivative trajectories at t are indeed seen to be below the mean deriva- 
tive trajectories at t. The trajectory corresponding to maximal score for the 
third eigenfunction also follows dynamic regression to the mean: here the 
trajectory for X is below the overall mean trajectory, so that the negative 
varying coefficient function (5 predicts that the derivative trajectory 
should be above the mean, which indeed is the case. 

That the variance of the drift process Z is small near the endpoint of the 
auction is also evident from the estimated variance function V(t) = vax(Z(t)) 
in the left panel of Figure 7, overlaid with the estimated variance function 
var(X^ 1 \t)) of X^\ The latter is rapidly increasing toward the end of the 
auction, indicating that the variance of the derivative process is very large 
near the auction's end. This means that price increases vary substantially 
near the end across auctions. The large variances of derivatives coupled 
with the fact that var(Z(t)) is small near the end of the auction implies 
that the deterministic part (11) of the empirical differential equation (6) 
explains a very high fraction of the variance in the data. This corresponds 
to a very high, indeed close to the upper bound 1, value of the coefficient of 
determination R 2 (t) (12), (13) in an interval of about 10 hours before the 
endpoint of an auction, as seen in the right panel of Figure 7. We therefore 
find that the dynamics during the endrun of an auction can be adequately 
modeled by the simple deterministic approximation (11) to the stochastic 
dynamic equation (6), which always applies. 
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Fig. 8. Regression of X^\t) on Xi(t) (both centered) at t — 125 hours (left panel) and 
t = 161 hours (right panel), respectively, with regression slopes /3(125) = —0.015 and coeffi- 
cient of determination i? 2 (125) = 0.28, respectively, /3(161) = -0.072 and i? 2 (161) = 0.99, 
demonstrating that the deterministic part (11) of the empirical differential equation (6) 
explains almost the entire variance of att — 161 hours but only a fraction of variance 
at t — 125 hours. 

This finding is corroborated by visualizing the regressions of XW(t) ver- 
sus X(t) at various fixed times t. These regressions are linear in the Gaussian 
case and may be approximated by a linear regression in the least squares 
sense in the non-Gaussian case. The scatterplots of X- 1 ' (t) — ft^ (t) versus 
Xi(t) — fi(t) for times t = 125 hours and t = 161 hours (where the time do- 
main of the auctions is between and 168 hours) are displayed in Figure 8. 
This reveals the relationships to be indeed very close to linear. These are 
regressions through the origin. The regression slope parameters are not es- 
timated from these scatterplot data which are contaminated by noise, but 
rather are obtained directly from (6), as they correspond to (3(t). Thus one 
simply may use the already available slope estimates, /3(125) = —0.015 and 
(3(1) = —0.072. The associated coefficients of determination, also directly es- 
timated via (13) and the corresponding estimation procedure, are found to 

be i? 2 (125) = 0.28 and i? 2 (161) = 0.99. 

As the regression line fitted near the end of the auction at t = 161 hours 
explains almost all the variance, the approximating deterministic differential 
equation (11) can be assumed to hold at that time (and at later times as well, 
all the way to the end of the auction). At t = 125 the regression line explains 
only a fraction of the variance, while a sizable portion of variance resides in 
the drift process Z, so that the stochastic part in the dynamic system (6) 
cannot be comfortably ignored in this time range. These relationships can be 
used to predict derivatives of trajectories and thus price changes at time t for 
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individual auctions, given their log price trajectory values at t. We note that 
such predictions apply to fitted trajectories, not for the actually observed 
prices which contain an additional random component that is unpredictable, 
according to model (1). We find that at time t = 161, regression to the mean 
is observed at the level of individual auctions: an above (below) average 
log price level is closely associated with a below (above) average log price 
derivative. This implies that a seemingly very good (or bad) deal tends to 
be not quite so good (or bad) when the auction ends. 

5. Discussion. The main motivation of using the dynamic system ap- 
proach based on (6) is that it provides a better description of the mechanisms 
that drive longitudinal data but are not directly observable. The empirical 
dynamic equation may also suggest constraints on the form of parametric 
differential equations that are compatible with the data. In the auction ex- 
ample, the dynamic equation quantifies both the nature and extent of how 
expected price increases depend on auction stage and current price level. 
This approach is primarily phenomenological and does not directly lend it- 
self to the task of predicting future values of individual trajectories. 

That expected conditional trajectory derivatives satisfy a first-order dif- 
ferential equation model (which we refer to as the "population level" since 
this statement is about conditional expectations) simply follows from Gaus- 
sianity and in particular does not require additional assumptions. This suf- 
fices to infer the stochastic differential equation described in (5) which we 
term "empirical differential equation" as it is determined by the data. Then 
the function R 2 , quantifying the relative contribution of the drift process Z 
to the variance of , determines how closely individual trajectories follow 
the deterministic part of the equation. We could equally consider stochastic 
differential equations of other orders, but practical considerations favor the 
modeling with first-order equations. 

We find in the application example that online auctions follow a dynamic 
regression to the mean regime for the entire time domain, which becomes 
more acute near the end of the auction. This allows us to construct predic- 
tions of log price trajectory derivatives from trajectory levels at the same t. 
These predictions get better toward the right endpoint of the auctions. This 
provides a cautionary message to bidders, since an auction that looks par- 
ticularly promising since it has a current low log price trajectory is likely 
not to stay that way and larger than average price increases are expected 
down the line. Conversely, an auction with a seemingly above average log 
price trajectory is likely found to have smaller than average price increases 
down the line. 

This suggests that bidders take a somewhat detached stance, watching 
auctions patiently as they evolve. In particular, discarding auctions that 
appear overpriced is likely not a good strategy as further price increases are 
going to be smaller than the average for such auctions. It also implies that 
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bid snipers are ill advised: a seemingly good deal is not likely to stay that 
way, suggesting a more relaxed stance. Conversely, a seller who anxiously 
follows the price development of an item, need not despair if the price seems 
too low at a time before closing, as it is likely to increase rapidly toward the 
end of the auction. 

For prediction purposes, drift processes Zi for individual auctions are of 
great interest. In time domains where their variance is large, any log price 
development is possible. Interestingly, the variance of drift processes is very 
small toward the right tail of the auctions, which means that the determin- 
istic part of the differential equation (6) is relatively more important, and 
log price derivatives during the final period of an auction become nearly 
deterministic and thus predictable. 

Other current approaches of statistical modeling of differential equations 
for time course data [e.g., Ramsay et al. (2007)] share the idea of modeling 
with a first order equation. In all other regards these approaches are quite 
different, as they are based on the prior notion that a differential equa- 
tion of a particular and known form pertains to the observed time courses 
and moreover usually have been developed for the modeling of single time 
courses. This established methodology does not take into account the co- 
variance structure of the underlying stochastic process. In contrast, this 
covariance structure is a central object in our approach and is estimated 
nonparametrically from the entire ensemble of available data, across all sub- 
jects or experiments. 

APPENDIX 

A.l. Additional details and discussion for preliminary formulas. For- 
mula (4) is an extension of the covariance kernel representation in terms of 
eigenfunctions, given by cov(X(t), X(s)) = ^2i^k't ) k{t) ( t ) k{s) [Ash and Gard- 
ner (1975)], which itself is a stochastic process version of the classical multi- 
variate representation of a covariance matrix C in terms of its eigenvectors ek 
and eigenvalues A&, C = ^ Afce^e^. Specifically, using representation (2), one 

finds cov(A^)(t),A^)( s )) = J2k,i^ k ,^ 1 \t)^ 2 \s), and (4) follows 
upon observing that cov(£fc, £/) = A/% for k = I and = for k^l. 

Regarding the "population differential equation" E{X^ l \t) — ^ l \t)\ 
X(t)} = f3{t){X{t) — fi(t)}, observe that for any jointly normal random vec- 
tors (Ui,U2) with mean and covariance matrix C with elements cn,ci2,C2i = 
ci2,C22, it holds that E{U2\U\) = (c2i/cn)Ui. Applying this to the jointly 
normal random vectors in (5) then implies this population equation. The 
specific form for the function f3 in (7) is obtained by plugging in the specific 
terms of the covariance matrix given on the right-hand side of (5). 

Applying (4), observing var(A(i)) =^ l( .Xk4 > k{t) 2 , and then taking the 

log-derivative leads to | log(var(A(t))) = 2E fc WkiWk* WE* ^l(t)}, 
establishing the last equality in representation (7). 
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A. 2. Estimation procedures. Turning to estimation, in a first step we 
aggregate measurements across all subjects into one "big" scatterplot and 
apply a smoothing method that allows us to obtain the ^th derivative of a 
regression function from scatterplot data. For example, in the case of local 
polynomial fitting, given a univariate density function K\ and bandwidth 
ha,v, one would minimize 



(21) V V kA ^— Ya - V CHntTy ~ t) 




for each t with respect to a m for m = 0, . . . , v + 1, from which one obtains 
= a v {t)v\ [Fan and Gijbels (1996)]. 
According to (3), we will also need estimates of ^G{t,s) = G^ v '°\ There 
are various techniques available for this task. Following Liu and Miiller 
(2009), to which we refer for further details, using again local polynomial 
fitting, we minimize the pooled scatterplot of pairwise raw covariances 



(22) 



=1 Kj/KNi 



hG,v ha,v 



\ 2^aim(2ii-tr + a2i(r« 



for fixed (t,s) with respect to a\ m and ai\ for m = 1, .. . ,v + 1, where 
Gi(Tij,Tu) = (Yij - fi(Tij))(Yii - p,(Tu)), j / I, k 2 is a kernel chosen as a bi- 
variate density function, and hQ v is a bandwidth. This leads to G^°\t, s) = 

OL\ v (t,8)v\. 

The pooling that takes place in the scatterplots for estimating the deriva- 
tives of \i and of G is the means to accomplish the borrowing of informa- 
tion across the sample, which is essential to overcome the limitations of the 
sparse sampling designs. We note that the case of no derivative v = is 
always included, and solving the eigenequations on the left-hand side of (3) 
numerically for that case leads to the required estimates Ai,A2,... of the 

eigenvalues and 0i,$2, • • • of the eigenfunctions. The estimates (j>i , 2 , • • • 
of the eigenfunction derivatives are then obtained from the right-hand side 
of (3), plugging in the estimates for eigenfunctions and eigenvalues, followed 
by a numerical integration step. 

The plug-in estimates, /3k, G Z: k, Vk,Rj(i are then obtained from the cor- 
responding representations, (7), (8), (10), (13), by including K leading com- 
ponents in the respective sums. While for theoretical analysis and asymp- 
totic consistency one requires K = K{n) — > oo, the number of included 
eigen-terms K in practical data analysis can be chosen by various crite- 
ria, for example, AIC/BIC based on marginal/conditional pseudo-likelihood 
or thresholding of the total variation explained by the included components 
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[Liu and Miiller (2009)]. One key feature of the covariance surface smooth- 
ing step in (22) is the exclusion of the diagonal elements (for which j = 1); 
the expected value for these elements includes the measurement error vari- 
ance a 1 in addition to the variance of the process. The difference between 
a smoother that uses the diagonal elements only and the resulting diagonal 
from the smoothing step (22) when no derivatives are involved can then be 
used to find consistent estimates for the error variance a 1 [Yao, Miiller and 
Wang (2005)]. 

To obtain estimates for the derivatives of the trajectories Xi, a realistic 
target is the conditional expectation (t)|i^i, . . . ,i^/Vj}. It turns out 

that this conditional expectation can be consistently estimated in the case of 
Gaussian processes by applying principal analysis by conditional expectation 
(PACE) [Yao, Miiller and Wang (2005)]. For X; = CXi(T a ), ■ ■ . , Xi(T m )) T , 
Yj = (Y a , . . . ,Y iNi ) T , ^ = (n(T il ),...,n(T iNi )) T , 4> ik = (4> k (Tn),..., 
(^kiTiNi)) 7 ', if iik and in (1) are jointly Gaussian, then by standard prop- 
erties of the Gaussian distribution, 

(23) £(£d Y) = X^J^y^Y, - fx*), 

where £^ = cov(Yj, Yj) = cov(Xi,X;) + a 2 I Ni . This implies E{x\ v \t)\Y a , 

■ ■ ■ M = ET=i Efa\Yj)$\t) = {ET=i x^ml^iY, - ^),u = 

0,1. The unknown quantities can be estimated by simply plugging in the 
variance, eigenvalue, eigenfunction and eigenfunction derivative estimates 
discussed above, again coupled with truncating the number of included com- 
ponents at K. 

A. 3. Additional assumptions and auxiliary results. Denote the densities 
of T and (T\, T2) by fi(t), f 2 (t, s), and define an interior domain by T = [a, b] 
with Ts = [a — 5, b + 5] for some 5 > 0. Regularity conditions for the densities 
and the targeted moment functions as well as their derivatives are as follows, 
where £±,£2 are nonnegative integers: 

(Bl) f[ 5 \t) exists and is continuous on Ts with f(t) > 0, 9t i®Q s t 2 h s ) 
exists and is continuous on Tg 2 for l\ + £2 = 5; 

(B2) /u( 5 )(i) exists and is continuous on Ts, dt e^Q s e 2 s ) exists and is con- 
tinuous on Ts for £1 + £2 = 5. 

We say that a bivariate kernel function n 2 is of order (v,£), where v is a 
multi-index u = (^1,^2), if 

/ / u ei v e2 K 2 (u,v)dudv 

(24) 

f0, < 4 +£ 2 < ^17^1,^ 7^2, 

= { {-itWvJ; 4 = ^1,^2 = ^2, 

1/0, £i+£ 2 = £, 
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where \u\ = v\ + U2<£. The univariate kernel k\ is said to be of order (zv, £) 
for a univariate v = u%, if (24) holds with £2 = on the right-hand side, 
integrating only over the argument u on the left-hand side. For the kernel 
functions n±, K2 used in the smoothing steps to obtain estimates for fj>(t) 
and /x (1) (t) in (21) and for G(t,s) and G^'°\t,s) in (22) we assume 

(B3) Kernel functions k± and K2 are nonnegative with compact supports, 
bounded and of order (0,2) and ((0,0), 2), respectively. 

The following lemma provides the weak L 2 convergence rate for univariate 
and bivariate weighted averages defined below. For arbitrary real functions 
6-M 2 -> K and fl* :K 4 K, define 0(f) = £{0(%, Y^)!?;.; = and = 
EiO^U^tiuYi^YuWij = t,T u = s}, let 9 v (t) = 0»(t)/i(i) for a single index 

v and 0* (f) = f2{t, s)g^-gu^9*(t, s) for a multi-index 1/ = (z/j, 1/2), and define 
the weighted kernel averages, employing bandwidths hi,h,2, 

(25) •^^EE^*-^} 



E{N)nh\- 
E{N(N-l)}nh% 



(26) 

For establishing convergence results for the general weighted averages (25), 
assume that: 

(B2 f ) Derivatives # w (i) exist and are continuous on 7~s, a+ef n to @* ft> ^) 



ists and is continuous on T? for ^1 + £2 



(B3^) The univariate kernel n\ is of order (v,£) and the bivariate kernel K2 
is of order (u,£). 

Lemma 1. Under (Al), (Bl), (B2t), (B3 f ) and max{/ti, /t 2 } -)• 0, 

7 2i/+l 1 1 2|i/|+2 

nn 1 00 ana nn 2 —> 00, as n — > 00, 

-o p ( 1 +ht v 

!nhf +1 



(27) 



1 



\\e - 9 v \\ a - o p y -^j^i + h 2 

A.4. Technical proofs. 



> 



Proof of Theorem 1. Since are jointly Gaussian processes, 

it is clear that Z is Gaussian. Formula (7) for /3(t) is obtained by ob- 
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serving that for joint Gaussian r.v.s, E{X^(t) - ^ (t)\X (t) - fi(t)} = 
[cov{X^(t),X(t)}/var{X(t)}]{X(t) -//(*)}• Then the properties of the 
functional principal component scores lead directly to 



cov{Xl'>(t),A'(()}=yjA t ^ 1) (()'#t(*). var{X(t)} = X> t *,(() : 



k=l k=l 



(28) 

whence p{t) = E{X^\t) - fi^(t)\X(t) - n(t)}. This implies E{Z(t)} = 0. 
According to (6), cov {Z(t),X(t)} = cov{X« (t),X(t)}- p(t) var{X(t)} = 0, 
for all t G T, using (28) and (7). This implies the independence of Z(t),X(t), 
due to the Gaussianity. Next observe cov {Z (t), Z(s)} = cov{X^ (t) — 
P(t)X{t),X<U ( s )-P(s)X(s)}, from which one obtains the result by straight- 
forward calculation. □ 

Proof of Lemma 1. Since Ni ~ ' N and N is a bounded and integer- 
valued random variable. Denote the upper bound by M. To handle the 
one-dimensional case in (25), we observe 

— 1 1 n /f-T-\ 

j = l 1 8=1 

M 1 

where l(-) is the indication function. Note that for each j, 9 v j is obtained 
from an i.i.d. sample. Slightly modifying the proof of Theorem 2 in Hall 
(1984) for a kernel of order (u, £) provides the weak convergence rate \\6 u j — 
0yP{N > j)\\ = O p {{nh\ p+l )- l l 2 + h[- v }. It is easy to check that Y,f=i P{N > 
j) = E(N), as N is a positive integer- valued random variable. Therefore, 



pi E ( N ) 



P(N>j) 



P [ , + hf 



Analogously, for the two-dimensional case in (26), let 

1 X "> / m m . - , - \ ft 1\i S Tj 



3 nh H+2 ^ 



d*(Tij,Tii,Yij,Yii)K2 (^-fp 1 , "—fT 1 ) 1{Ni ~ max(i ' 

a — i V 2 2 / 



and then #* = ^2i<j^i<]\j[E{N(N — 1)}] . Similarly to the above, one has 

\Kj ~ 6uP{N > max(J,l)}\\ 3 = Opiinh 2 ^ 2 )- 1 / 2 + h £ f H }. Again it is easy 
to verify that E{N(N - 1)} = YIk&km p { N ^ max (i> 01- The triangle in- 
equality for the L 2 distance entails \\9* — 9 V \\ 3 = O p {(nhl} u \ +2 )- 1 / 2 + h & 2 ~^}. 
□ 
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Proof of Theorem 2. Note that the estimators ft, G and G^ 1 ' ) 
all can be written as functions of the general averages defined in (25), (26). 
Slightly modifying the proof of Theorem 1 in Liu and Miiller (2009), with 
sup rates replaced by the L 2 rates given in Lemma 1, then leads to the 
optimal L 2 weak convergence rates for p? and in (15). 

For the convergence rate of <j)^ , Lemma 4.3 in Bosq (2000) implies that 



(29) 



|Afc-A fc |<||G-G| 



< 2V257 1 \\G-G\ 



where 5 k is defined in (14) and G is an arbitrary estimate (or perturbation) 
of G. Denote the linear operators generated from the kernels G^ 1 ' ) and 
G(i,o) by g(i,o) ( respectively, G^ 1 ' ). Noting that 5 k < X k , one finds 



t M||< J_||G(i,°)«j 



(30) 



&\ { ||G(i>o)_ G d,o) 

A k 



k - 4>k\\} + 



+ 



1 1 

c — Afc I 



p 1 

Afc 



i|gr(i,o) _ G ,(i,o)i 



1 



+ -||G-G| 

0k 



which implies (16). □ 



Proof of Theorem 3. From (29) it is easy to see that \X k — X k \ = 
Op(||<^fc - 4>k\\), and from both (29) and (30) that \\4> k ~ 4>k\\ = o p (\\4> k - 
||) uniformly in k. One then finds that HG^ 1 ' 1 ) — G^ 1 ' 1 )^ is bounded in 
probability by 



K 



+ 



k=l 



k=K+l 



K 



k=l 



\R 



which implies that \\G (l ^ - G (l ^ \\ s = O p (a n + \\R* K J), where a n is defined 
in (18) and the remainder terms in (17). Similar arguments lead to HG^ 1,1 ** — 
G^'^llu = O p (a n + H-Rf^ill), noting [| -^Ik" l II — ll-^^ill ^ ue *° * ne Cauchy- 
Schwarz inequality. 

Regarding \\(3k — j3\\, one has 



->K- 



< 



1 



K 



mf t G{t,t) 



Xk^k^ I 



+ 



\k=l 



K 



^2 ^k4>k4>k 



fc=i 
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+ 




E^n^-^ii + ii^oii 



Afc<; 



k=K+l 



K 



k=l 



applying the Cauchy-Schwarz inequality to || J2k=K+i ^k^k^k 
ing ^\\R K ,o\\ ■ \\Rk,i\\ < (\\Rk,o\\ + \\Rk,i\\}/2 yields \\$ K - 
\\Rk,o\\ + \\RkM\)- 
To study \\G Zi k — Gz 



. Observ- 
O p {a n + 



h 



K 



Pk A fc (; 



we investigate the L 2 convergence rates of 

oo 

(1) 



k=l 



k=l 



G K p K -pGPl 



where ft (resp., /3k) and 4> k (resp., 4> k ) share the same argument, and we 
define Gx(t,s) = EfcLi -^fc^)^^)- In analogy to the above arguments, 

h £ Wk - P\\ + Eti A fe ||^ 1} - 4 1} II + Vll^oll-ll^ill, ^2 = Wk -p\\ + 

Ylk=i ^kW'Pk - <Pk\\ + \\RK,o\\ - Tnis leads to \\Gz,K-G z \\ s = O p (a n + \\R K ,o\\ + 
\\Rk,i\\)- The same argument also applies to HGz^ — G z || u - Next we study 

2|| „„A C„A UX^ K X.li^lM ^oo x . A ( V1 ) ±(u 2 )\\ L 



\ R K 



R 2 



and find that \\^2 k=1 ^k<P k y k 

ELxMii^-^ll + ll^-^ 

||-Ric,i/ill + \\Rk,. 

the proof. □ 



+ \\<PV ~ 0*T1J + ll-Rtf^ill + ll-Rftr.^ll = o p («„ + 

u 2 \\j- Analogous arguments apply to \\Vk — V\\, completing 
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